Error-Tolerant Clustering of Gene Microarray Data
نویسنده
چکیده
Gene microarray technology allows for unprecedented and massive production of biological data across multiple experimental conditions and in time series. Computer analysis of this data can help guide biological bench work toward the assignment of gene function, classification of cells and tissues and the ultimately assist in the diagnosis and treatment of disease. One approach to the analysis of microarray data is the identification of group of genes with common expression patterns or “clusters”. The author implements an error-tolerant clustering algorithm due to Amir Ben-Dor, Ron Shamir and Zohar Yakhini. In their original paper, they defined a stochastic error model for microarray data, and, based on that model, prove that their algorithm recovers the underlying cluster structure of microarray data with high probability. In this paper, their results are replicated on artificial data. In addition, the author tests the stability of clusterings generated by the algorithm and compares the use of discretized and non-discretized similarity graphs. Student: Jay Cahill Degree Candidate 2002, Bachelor of Arts in Computer Science Boston College Email: [email protected] Tel: (617) 308-7218 Advisor: Peter Clote, Ph.D., Doctorat d'Etat Professor of Computer Science Dept of Computer Science and Dept of Biology Boston College Email: [email protected] Tel: (617) 552-1332 Cahill 2 Note: This paper has been modified from its original form. Interest in the original should be directed to Peter Clote, [email protected] or Jay Cahill, [email protected]. Part I: Motivation and Results This paper is organized into three parts. The first part is intended to define the motivation and idea behind the project and present the results. The second part is intended as a user guide and documentation to enable the future maintenance and extension of the CAST software. The third part offers conclusions and ideas for further study. Section
منابع مشابه
Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis
Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...
متن کاملبه کارگیری روشهای خوشهبندی در ریزآرایه DNA
Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...
متن کاملبه کارگیری خوشهبندی دوبعدی با روش «زیرماتریسهای با میانگین- درایههای بزرگ» در دادههای بیان ژنی حاصل از ریزآرایههای DNA
Background and Objective: In recent years, DNA microarray technology has become a central tool in genomic research. Using this technology, which made it possible to simultaneously analyze expression levels for thousands of genes under different conditions, massive amounts of information will be obtained. While traditional clustering methods, such as hierarchical and K-means clustering have been...
متن کاملExpression Profiling of Microarray Gene Signatures in Acute and Chronic Myeloid Leukaemia in Human Bone Marrow
Background Classification of cancer subtypes by means of microarray signatures is becoming increasingly difficult to ignore as a potential to transform pathological diagnosis nonetheless, measurement of Indicator genes in routine practice appears to be arduous. In a preceding published study, we utilized real-time PCR measurement of Indicator genes in acute lymphoid leukaemia (ALL) and acute m...
متن کاملGene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method
Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expressio...
متن کامل